Goto

Collaborating Authors

 optimal control problem



Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions

Bhole, Ajinkya, Filabadi, Mohammad Mahmoudi, Crevecoeur, Guillaume, Lefebvre, Tom

arXiv.org Artificial Intelligence

This paper develops a unified perspective on several stochastic optimal control formulations through the lens of Kullback-Leibler regularization. We propose a central problem that separates the KL penalties on policies and transitions, assigning them independent weights, thereby generalizing the standard trajectory-level KL-regularization commonly used in probabilistic and KL-regularized control. This generalized formulation acts as a generative structure allowing to recover various control problems. These include the classical Stochastic Optimal Control (SOC), Risk-Sensitive Optimal Control (RSOC), and their policy-based KL-regularized counterparts. The latter we refer to as soft-policy SOC and RSOC, facilitating alternative problems with tractable solutions. Beyond serving as regularized variants, we show that these soft-policy formulations majorize the original SOC and RSOC problem. This means that the regularized solution can be iterated to retrieve the original solution. Furthermore, we identify a structurally synchronized case of the risk-seeking soft-policy RSOC formulation, wherein the policy and transition KL-regularization weights coincide. Remarkably, this specific setting gives rise to several powerful properties such as a linear Bellman equation, path integral solution, and, compositionality, thereby extending these computationally favourable properties to a broad class of control problems.


Learning Dynamics from Infrequent Output Measurements for Uncertainty-Aware Optimal Control

Lefringhausen, Robert, Springer, Theodor, Hirche, Sandra

arXiv.org Artificial Intelligence

Abstract: Reliable optimal control is challenging when the dynamics of a nonlinear system are unknown and only infrequent, noisy output measurements are available. This work addresses this setting of limited sensing by formulating a Bayesian prior over the continuous-time dynamics and latent state trajectory in state-space form and updating it through a targeted marginal Metropolis-Hastings sampler equipped with a numerical ODE integrator. The resulting posterior samples are used to formulate a scenario-based optimal control problem that accounts for both model and measurement uncertainty and is solved using standard nonlinear programming methods. The approach is validated in a numerical case study on glucose regulation using a Type 1 diabetes model. Keywords: Probabilistic and Bayesian methods for system identification, Nonlinear system identification, Time series modeling, Statistical inference, Learning methods for optimal control, Model predictive control, Data-driven control theory 1. INTRODUCTION Accurate dynamical models are fundamental for the predictive and optimal control of nonlinear systems. Although first-principles models may describe the general structure of many systems, important parameters or effects often remain unknown, limiting their direct use for control.


Global Convergence of Policy Gradient for Entropy Regularized Linear-Quadratic Control with Multiplicative Noise

Diaz, Gabriel, Li, Lucky, Zhang, Wenhao

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) has emerged as a powerful framework for sequential decision-making in dynamic environments, particularly when system parameters are unknown. This paper investigates RL-based control for entropy-regularized linear-quadratic (LQ) control problems with multiplicative noise over an infinite time horizon. First, we adapt the regularized policy gradient (RPG) algorithm to stochastic optimal control settings, proving that despite the non-convexity of the problem, RPG converges globally under conditions of gradient domination and almost-smoothness. Second, based on zero-order optimization approach, we introduce a novel model free RL algorithm: Sample-based regularized policy gradient (SB-RPG). SB-RPG operates without knowledge of system parameters yet still retains strong theoretical guarantees of global convergence. Our model leverages entropy regularization to address the exploration versus exploitation trade-off inherent in RL. Numerical simulations validate the theoretical results and demonstrate the efficiency of SB-RPG in unknown-parameters environments.


A Review of Pseudospectral Optimal Control: From Theory to Flight

Ross, I. M., Karpenko, M.

arXiv.org Artificial Intelligence

The home space for optimal control is a Sobolev space. The home space for pseudospectral theory is also a Sobolev space. It thus seems natural to combine pseudospectral theory with optimal control theory and construct ``pseudospectral optimal control theory,'' a term coined by Ross. In this paper, we review key theoretical results in pseudospectral optimal control that have proven to be critical for a successful flight. Implementation details of flight demonstrations onboard NASA spacecraft are discussed along with emerging trends and techniques in both theory and practice. The 2011 launch of pseudospectral optimal control in embedded platforms is changing the way in which we see solutions to challenging control problems in aerospace and autonomous systems.



Barrier-Riccati Synthesis for Nonlinear Safe Control with Expanded Region of Attraction

Almubarak, Hassan, AL-Sunni, Maitham F., Dubbin, Justin T., Sadegh, Nader, Dolan, John M., Theodorou, Evangelos A.

arXiv.org Artificial Intelligence

We present a Riccati-based framework for safety-critical nonlinear control that integrates the barrier states (BaS) methodology with the State-Dependent Riccati Equation (SDRE) approach. The BaS formulation embeds safety constraints into the system dynamics via auxiliary states, enabling safety to be treated as a control objective. To overcome the limited region of attraction in linear BaS controllers, we extend the framework to nonlinear systems using SDRE synthesis applied to the barrier-augmented dynamics and derive a matrix inequality condition that certifies forward invariance of a large region of attraction and guarantees asymptotic safe stabilization. The resulting controller is computed online via pointwise Riccati solutions. We validate the method on an unstable constrained system and cluttered quadrotor navigation tasks, demonstrating improved constraint handling, scalability, and robustness near safety boundaries. This framework offers a principled and computationally tractable solution for synthesizing nonlinear safe feedback in safety-critical environments.



Hessians in Birkhoff-Theoretic Trajectory Optimization

Ross, I. M.

arXiv.org Artificial Intelligence

This paper derives various Hessians associated with Birkhoff-theoretic methods for trajectory optimization. According to a theorem proved in this paper, approximately 80% of the eigenvalues are contained in the narrow interval [-2, 4] for all Birkhoff-discretized optimal control problems. A preliminary analysis of computational complexity is also presented with further discussions on the grand challenge of solving a million point trajectory optimization problem.